Subspace Clustering for all Seasons

نویسندگان

  • Sergio Peignier
  • Christophe Rigotti
  • Guillaume Beslon
چکیده

Subspace clustering is recognized as a more general and difficult task than standard clustering since it requires to identify not only objects sharing similar feature values but also the various subspaces where these similarities appear. Many approaches have been investigated for subspace clustering in the literature using various clustering paradigms. The reader is referred for instance to [5] for detailed reviews and comparisons of the best methods and main categories. Even if many evolutionary clustering approaches exist [4] very few of them address the subspace clustering problem and still include non-evolutionary stages [7,8]. According to [1], bio-inspired optimization algorithms could be improved by incorporating knowledge from molecular and evolutionary biology. A promising source of advances in optimization is one of the important phenomena in evolutionary biology: the dynamic evolution of the genome structure. Several studies showed for instance that an evolvable genome structure allows evolution to modify the effects that evolution operators (e.g., mutations) have on individuals, a phenomenon known as evolution of evolution [3]. In this paper, we present Chameleoclust, an evolutionary subspace clustering algorithm that incorporates a genome having an evolvable structure. The genome is a coarse-grained genome, inspired on [2], and defined as a list of tuples (the "genes"), each tuple containing numbers. These tuples are mapped at the phenotype level to denote core point locations in different dimensions, which are then used to collectively build the subspace clusters, by grouping the data around the core points. The biological analogy here would be that each gene codes for a molecular product and that the combination of molecular products associated together codes for a function, i.e., a cluster. To allow for evolution of evolution, Chameleo-clust genome contains a variable proportion of functional elements as , and is subject to local mutations and to large random rearrangements, namely: large deletions, duplications and translocations. Local mutations and rearrangements may thus modify the genome elements but also the genome structure. The key intuition in the design of the Chameleoclust algorithm is to take advantage of such an evolvable structure to detect various numbers of clusters in subspaces of various dimensions and to self-tune the main evolutionary parameters (e.g., levels of variability). Figure 1 illustrates Chameleoclust genome structure (genome length and proportion of functional elements) and fitness convergence on a synthetic benchmark dataset. The reader is refered to [6] for a detailed description of Chameleoclust. The algorithm has been assessed using a …

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Hierarchical Subspace Clustering

It is well-known that traditional clustering methods considering all dimensions of the feature space usually fail in terms of efficiency and effectivity when applied to high-dimensional data. This poor behavior is based on the fact that clusters may not be found in the high-dimensional feature space, although clusters exist in subspaces of the feature space. To overcome these limitations of tra...

متن کامل

Feature Selection based Semi-Supervised Subspace Clustering

Clustering is the process which is used to assign a set of n objects into clusters(groups). Dimensionality reduction techniques help in increasing the accuracy of clustering results by removing redundant and irrelevant dimensions. But, in most of the situations, objects can be related in different ways in different subsets of the dimensions. Dimensionality reduction tends to get rid of such rel...

متن کامل

Finding and Visualizing Subspace Clusters of High Dimensional Dataset Using Advanced Star Coordinates

Analysis of high dimensional data is a research area since many years. Analysts can detect similarity of data points within a cluster. Subspace clustering detects useful dimensions in clustering high dimensional dataset. Visualization allows a better insight of subspace clusters. However, displaying such high dimensional database clusters on the 2-dimensional display is a challenging task. We p...

متن کامل

Subspace clustering for complex data

Clustering is an established data mining technique for grouping objects based on their mutual similarity. Since in today’s applications, however, usually many characteristics for each object are recorded, one cannot expect to find similar objects by considering all attributes together. In contrast, valuable clusters are hidden in subspace projections of the data. As a general solution to this p...

متن کامل

Evaluating Subspace Clustering Algorithms

Clustering techniques often define the similarity between instances using distance measures over the various dimensions of the data [12, 14]. Subspace clustering is an extension of traditional clustering that seeks to find clusters in different subspaces within a dataset. Traditional clustering algorithms consider all of the dimensions of an input dataset in an attempt to learn as much as possi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015